Genetic diversity of a Silybum marianum (L.) Gaertn. germplasm collection revealed by DNA Diversity Array Technology (DArTseq)

Silybum marianum (L.) Gaertn. is a multipurpose crop native to the Mediterranean and middle east regions and mainly known for the hepatoprotective properties of fruit-derived silymarin. Despite growing interest in milk thistle as a versatile crop with medicinal value, its potential in agroindustry is hindered by incomplete domestication and limited genomic knowledge, impeding the development of competitive breeding programs. The present study aimed to evaluate genetic diversity in a panel of S. marianum accessions (n = 31), previously characterized for morphological and phytochemical traits, using 5,178 polymorphic DArTseq SNP markers. The genetic structure investigated using both parametric and non-parametric approaches (e.g. PCA, AWclust, Admixture), revealed three distinctive groups reflecting geographical origins. Indeed, Pop1 grouped accessions from Central Europe and UK, Pop3 consisted mainly of accessions of Italian origin, and Pop2 included accessions from different geographical areas. Interestingly, Italian genotypes showed a divergent phenotypic distribution, particularly in fruit oleic and linoleic acid content, compared to the other two groups. Genetic differentiation among the three groups, investigated by computing pairwise fixation index (FST), confirmed a greater differentiation of Pop3 compared to other subpopulations, also based on other diversity indices (e.g. private alleles, heterozygosity). Finally, 22 markers were declared as putatively under natural selection, of which seven significantly affected some important phenotypic traits such as oleic, arachidonic, behenic and linoleic acid content. These findings suggest that these markers, and overall, the seven SNP markers identified within Pop3, could be exploited in specific breeding programs, potentially aimed at diversifying the use of milk thistle. Indeed, incorporating genetic material from Pop3 haplotypes carrying the selected loci into milk thistle breeding populations might be the basis for developing milk thistle lines with higher levels of oleic, arachidonic, and behenic acids, and lower levels of linoleic acid, paving new avenues for enhancing the nutritional and agronomic characteristics of milk thistle.

The genus Silybum was described to group only two species: S. marianum and S. eburneum [1] and it was argued that probably the two forms are only variants of the same species [4], although this classification is still under debate [5,6].Milk thistle has been utilized for more than 2000 years and mainly cultivated in Asia and Eastern Europe as a medicinal plant due to the phytochemical properties of its prominent compound namely silymarin, a complex of bioactive flavonolignans accumulated in the seed integument from 1.5% up to 4.3% of the total fruit weight [7,8].Pharmacologically relevant actions of silymarin include hepatoprotective properties and antioxidant, anti-inflammatory, antifibrotic, hypolipidemic, neurotrophic, and neuroprotective effects [9].Besides silymarin, fruits are also rich in oil and protein, showing that milk thistle can also have different possible agrifood and industrial applications [10].From an agronomic perspective, milk thistle is characterized by significant fruit and plant biomass yield and its potential use for fodder, bioenergy production and phytoremediation as well as for feed and cosmetics is relatively unexplored [11][12][13].
Despite the increasing interest in S. marianum as a multipurpose crop and its recognised importance as a medicinal species, its exploitation in agroindustry systems is mainly limited by the fact that the species is not completely domesticated [6,14] and the genomic knowledge is still very poor to start a breeding program.In addition, a comprehensive understanding of the genetic variability and relationships between accessions in the available germplasm collections represents a key step in biodiversity conservation, monitoring, and exploitation [15] and thus, a crucial step toward an efficient breeding program design.Many molecular marker technologies have been developed and applied to study genetic diversity in germplasm collections and breeding programs [16], including RFLPs [17], RAPDs [18], ISSRs [19], SSRs [20] and AFLPs [21].There have been some recent attempts to investigate genetic diversity in different S. marianum collections using the SCoT (Start Codon-Targeted) [22], AFLP (Amplified Fragment Length Polymorphism) [23], ISSR (Inter Simple Sequence Repeats) [24], and co-dominant insertion/deletion (InDel) [25] markers.However, their major limitations are poor genome coverage, low discrimination ability, poor reproducibility, and technical and time requirements, together with high cost per unit, making them unsuitable for high-throughput genotyping.Moreover, all these studies have mainly investigated the genetic variation of a few accessions, coming from some well-defined geographical areas such as Iran [22][23][24] or Korea [25].
DArT (Diversity Array Technology Pty Ltd) markers are a prime alternative as they combine high-throughput DNA array technology, restriction site polymorphism analysis, genome complexity reduction, and PCR amplification leading to the production of thousands of polymorphic loci in a single assay [26] thus providing a cost-effective and efficient means for plant genotyping [27][28][29][30][31][32][33], even in species where genome sequence information are not available [34,35].DArT markers have been applied successfully in genomic studies in many species including those with large and complex genomes such as barley [36], sugarcane [37], wheat [38], rye [39], oat [40], and strawberry [33].However, while the initial DArT implementation on the microarray platform involved fluorescent labeling of representations and hybridization to dedicated DArT arrays, currently the DArTseq method deploys efficient genotyping-bysequencing platforms which allows genome-wide marker discovery through restriction enzyme-mediated genome complexity reduction and sequencing of the restriction fragments [34].
In this study, a DArTseq approach was applied to assess the genetic diversity of 31 S. marianum accessions from Southern Europe (e.g., Italy, Spain), Central Europe and UK (e.g., Austria, Germany), and other countries worldwide (e.g., Canada, North Korea) (Table 1), providing valuable insights into milk thistle diversity collected across different continents and climates.The collection was previously characterized for phenotypic traits including fruit morphology and chemical traits such as flavonolignans and fatty acids content [5,41].The comparison between the genotypic and phenotypic data conducted in the present study aimed firstly to better understand the origin of the accessions preserved in the germplasm bank and their botanical classification, but also to identify interesting genetic material to be used in milk thistle breeding programs.

DNA extraction and DArT sequencing
Given that, a large part of the collection included accessions collected from the wild and their genetic diversity is still unknown, DNA extraction and DArT analysis were performed on three seedlings for each of 31 accessions, except SIL3 and G1, for which 4 plants were collected, and G22 with two plants, for a total of 94 samples.DNA extraction was performed as described by Martinelli et al. [14].Each seedling is identified by a two-digit alphanumerical code; correspondence between seedling and accession is listed in Table 1.DNA samples were then processed at Diversity Array Technology (DArT) Pty, Ltd., Bruce, Australia (http://www.diversityarrays.com/).A PstI-MseI genome complexity reduction method was used, and a series of digestion-ligation reactions were performed using the protocol described by [26] with some modifications.Both PstI-and MseI-adaptors were designed to include an Illumina flowcell attachment sequence and only PstI-MseI fragments were then amplified on a 30 cycles PCR reaction.Equimolar amounts of amplified products of each sample were then transferred on a 96-well plate, applied to a c-Bot system (Illumina) for a bridge PCR amplification and finally sequenced on an Illumina Hiseq2500 (Illumina Inc., USA) for 77 cycles.
Roughly 5,200 DArTseq markers scoring was achieved using the DArTsoft14 software plugin in the KDCompute application (http://www.kddart.org/kdcompute.html).Two types of DArTseq markers, SilicoDArT markers and SNP markers were both scored by the provider as binary for the presence/absence (1 and 0, respectively) of the restriction fragment with the marker sequence in the genomic representation of the sample.Raw data are available in FIG-SHARE database with the following doi: 10.6084/m9.figshare.25551132.
SilicoDArT markers were aligned to the S. marianum reference genome [3], to identify chromosome positions by BLAST tool and retrieving only hits with identity and alignment length > 95%.SNPs with unknown positions were filtered out.

Genetic structure, diversity, and identification of outliers SNPs
The genetic structure of the Silybum core collection was investigated by various methods for comparison.To have a first description of the data, a Principal Component Analysis (PCA) was performed using GAPIT3 [42] package in R, after filtering away non-polymorphic SNPs.PCA plots were created in R using the ggplot2 package [43].Then, population structure was inferred using the non-parametric method available in the AWclust software [44].To cluster individuals in the ASD (Allele Sharing Distance) matrix, AWclust applies Ward's minimumvariance cluster analysis (R square = D2), where it calculates the genetic distance between every pair of individuals.With the Gap statistics frame, AWclust also estimates the optimal number of groups (K) based on the sample genetic relatedness [45].Finally, Admixture version 1.23 [46] was used to define the population structure using the following parameters: 10-fold Cross-Validation (CV) for subpopulations (K) ranging from K = 1 to 16 and 1,000 bootstrap replicates.CV scores were used to determine the best K value.Each genotype was assigned to a specific group when the membership coefficient (qi) was higher than 0.60, whereas individuals with qi lower than 0.5 at each K were considered as admixed.Pairwise genetic distance between subpopulations was estimated using Weir and Cockerham's average F ST using Plink [47].Nei's gene diversity (H), Shannon Index (I), and the percentage of private alleles were estimated using Genalex v.6.5 [48].

Phenotypic differentiation based on population structure and divergent DArT
Phenotypic data previously reported by Martinelli et al. [5,41] for the same collection investigated here were downloaded and grouped based on the genetic structure identified in this study (S1 Table ).For each group, means and variance distributions for each phenotypic trait were calculated and significant differences were assessed using a pairwise T-test implemented in the R environment [51].Similarly, the allelic effect of DArTs under natural selection identified by Bayescan 1.2 [49] was investigated.In particular, we divided the collection into two groups according to the genotypic profile at each marker to test whether the mean of phenotypic traits was significantly different (T-test; p-value � 0.01).

Germplasm collection and genetic characterization
Out of roughly 5,200 DArTseq received from Dart Pty Ltd, 3,629 were mapped in unique regions of the 17 S. marianum chromosomes [3] (Fig 1 ), whereas 386 were defined as multimapped, for a total of 5,178 polymorphisms.DArTs were distributed across all chromosomes, ranging from 148 on chromosome 15 to 567 on chromosome 4, with an average of 300 DArTs sequences/chromosome (Fig 2).Seventeen additional probes were instead located on eight contigs, with ctg000020 and ctg000550 harboring the highest number (four DArT) and ctg000400, ctg000410 and ctg000490 the lowest (S2 Table ).
Similarly, kinship analysis revealed three clusters, perfectly consistent with PCA populations (Fig 3B).AWclust [44]   The AMOVA revealed much greater variation within populations (66%) than among the populations (34%), confirming the low genetic differentiation among the subpopulations, but high genetic differentiation within subpopulations (Table 2).

Phenotypic differentiation based on genetic classification
Phenotypic diversity for quality traits such as silymarin and oil constituents, seed morphological parameters and other agronomic-relevant traits was previously investigated [5] as detailed in S1 Table .Here, we assessed the relatedness of the identified population structure with the measured traits (  In terms of flavonolignans content, a wide variability was observed among each constituent [5].The Pop3 is characterized by higher levels of total silymarin (p-value < 0.05), taxifolin (pvalue < 0.05) and Isosilybin A (p-value < 0.05), compared to the other populations (Fig 3B).No significant difference was observed within the three populations for silycristin content (pvalue = 0.213), silydianin content (p-value = 0.122), silybin A (p-value = 0.227) and B (pvalue = 0.202) content (Fig 4B ), among these a positive correlation was previously reported [5].
The evaluation of agronomic and seed morphological traits in the frame of the population structure revealed that Pop3 showed a higher content of carbon (p-value < 0. Overall, the 3 populations classified with DArTseq markers showed different phenotypic means for many of the traits previously measured.Interestingly, the highest phenotypic variability in terms of both oil and silymarin content was found in Pop3 compared to Pop1 and Pop2, suggesting that an environmental selective pressure may have caused these phenotypes to be more favorable in Italy.
The gene function of the identified outliers located within annotated genes was inferred using the best-hit approach through the Blast.Several outliers were found in S. marianum genes mainly encoding for signaling proteins and transporters involved in biological functions or primary metabolism-related functions [52][53][54][55][56]. Interestingly, the DArT "5872821" was found within the gene Smar02g039390, the putative ortholog of CcPHR1-like which encodes for a phosphate starvation response regulator in conditions of limited phosphorus availability [55], a gene family characterized in diverse genera of the family Gramineae that can be linked to selection under diverse environmental conditions.
The DArT "5874342" matched with Smar01g005980, orthologous to S-adenosylmethionine uptake transporter, whereas the DArT "5870307" was found spanning the gene Smar05g013800, orthologs of Receptor-Like Kinase 2 (LKY2) encoding gene which is involved in elicitor-mediated biotic responses [54].Furthermore, the DArT "5869938" falls into Smar03g006250, a gene locus encoding for TBCC domain-containing proteins, known to be involved in organ development and vascularization in diverse plant species [53].The gene locus Smar05g040540 associated with the DArT marker "5871636", according to the function of its putative ortholog in Drosophila melanogaster [52], could be also involved in plant environmental adaptation.

Allelic effect of DArT under natural selection on different phenotypic traits
Among the DArT identified by Bayescan (Table 3), seven loci with high allele frequency (>0.9) in Pop3 showed a significant effect (p-value < 0.0001) on linoleic and oleic acid contents (Fig 6).Interestingly, the favourable haplotype for higher linoleic acid content (AGAACGC) was almost fixed in Pop3 (81.48%), whereas the unfavourable haplotype (GTGGTTT) abundant in both Pop1 (79.24%) and Pop2 (95.85%) (S4 Table ).In addition, an opposite trend was observed for the same haplotype for oleic and behenic acid content (Fig 5), confirming the negative relationship between oleic and linoleic acid content.
Four loci ("5871854", "5870087", "5869938" and "5871972"), all shared with those identified for linoleic and oleic acid contents, were identified as significant also for total carbon content, with favourable haplotype (GTTT) almost fixed in Pop3.Whereas four loci ("5870307", "5871854", "5871636" and "5871972") slightly impacted seed length, with the favourable haplotype (AACG) being almost fixed in Pop1 and Pop2 but not in Pop3.Among the selected DArTs, "5871854" was the only one showing a significant low impact on seed area and arachidonic acid content, whereas the locus "5873408" slightly impacted the taxifolin content.

Discussion
This study examined the genetic structure of an ex-situ S. marianum collection through both parametric and non-parametric approaches and indicated that individuals could be grouped into three distinct groups (Pop1-3), largely reflecting their geographical origins.Unespectively, in some instances, samples derived from different plants belonging to the same accession clustered in different groups suggesting that although being kept and reproduced in different Gen-Bank, the accessions may still have residual heterogeneity as originally collected from the wild.Interestingly, accessions with non-variegated leaves (G5, G17, and G18) were included in Pop2 except SIL3 (S. eburneum accession) which, despite the non-variegated leaves, grouped into Pop1.This unexpected result suggested a misclassification of SIL3 as S. eburneum by the original seed collector, likely due to the absence of variegation on its leaves.Despite previously stated [4], the absence of leaf variegation is not a distinctive feature of S. eburneum [4], given that this trait is not mentioned in the botanical description of the species [1].A more in-depth analysis, encompassing both genetic and phenotypic characterization of additional S. eburneum accessions will clarify the classification of SIL3, allowing us to better define the species classification in the Silybum genus, being more likely an S. marianum accession.Given the geographical clustering obtained here and considering that S. marianum is only naturalized in North and South America, New Zealand, and Australia [57], we may hypothesise that the species has been introduced to Canada from Poland.A suggestive hypothesis that should be validated with larger datasets.
Comparing these results with the phenotypic ones reported by Martinelli et al. [5] on the same genetic materials, the discriminating power of the SNP markers was highlighted.Effectively, Martinelli et al. [5] did not identify distinctive groups in the same S. marianum collection by using morphological and biochemical data only (e.g.fruit morphology, total oil content, oil fatty acid profile, taxifolin, flavonolignans content), with Italian accessions (Pop3) being distributed across three out of nine distinct clusters.This clustering analysis effectively distinguished between accessions with silymarin chemotypes A and B. Specifically, five clusters grouped genotypes with chemotype A, while another three grouped those with chemotype B. On the contrary, the clustering based on genomic markers is not able to separate the different silymarin chemotypes (S5 Table ).This could suggest that this important phenotypic trait, known to be genetically inherited [58], is probably not associated to the phyletic origin of the accessions, but unevenly spread in world germplasm.These findings contrasted with Shokrpour et al. [59], who clustered the accessions based on their origin, by analyzing the morphological characteristics and flavonolignans properties of 32 milk thistle ecotypes, collected from northern and southern regions of Iran; suggesting in this case a differentiation probably associated to the different geomorphology of the Iranian regions.
Grouping milk thistle accessions based on their geographic origins by using molecular markers was also confirmed by Mohammadi et al. [23].The authors used AFLP markers to assess the molecular diversity in 32 populations of S. marianum collected from seven provinces of Iran and identified three major groups consistent with their geographical grouping, with only a few exceptions.Correspondence between genetic and geographical distance was also reported for other plants such as C. odorata specimens [60] using DArT SNP markers.The authors adopted a target capture method coupled with short-read sequencing to identify spatially informative SNPs that differentiate species based on latitude, temperature, and precipitation.
Regarding the diversity indices considered in this study, Nei's genetic diversity (H, 0.17) and Shannon diversity Index (I, 0.27), our results are consistent with those obtained by Mohammadi et al. [23], where H and I were 0.20 and 0.29, respectively, and by Saghalli et al. [24], where H and I were 0.33 and 0.49, respectively.In contrast, these results differ from those of Rafizadeh et al. [22], where the average H was 0.72 and the average I was 0.83.Specifically, Rafizadeh et al. [22] investigated 80 S. marianum genotypes from 8 populations in Iran.
Although their H and I values are higher, they also found greater genetic diversity than in our study, which seems to be attributable to within-group (58%) rather than between-group variation (42%).The same authors highlighted that various factors, including genetic drift, mutation, and natural selection, along with genetic marker systems, could impact genetic differentiation [22].
Based on pairwise fixation index (F ST ), Pop3 showed a greater differentiation compared to the other subpopulations, consistent with other diversity indices such as private alleles and heterozygosity.The higher differentiation of Pop3 is notably evident when observing the phenotypic distribution based on genetic clustering, since a divergent pattern for oleic, arachidonic, behenic, and linoleic acid content, was observed compared to the other two groups.The oil content found in Pop3 is comparable with that identified in the five species most used for oil production (e.g., sunflower, peanut, rapeseed, mustard, and olive oil) and cultivated in Eastern Europe suggesting that milk thistle, could also be a viable vegetable oil source [61].Pop3 also exhibited a higher total carbon content and smaller seed area, width, and length, all important phenotypic traits important for future breeding programs.However, it is important to note that Italian accessions abounded in our collection, thus probably this factor might have an impact.
Therefore, although further studies are needed, Bayescan analysis, a widely used method for detecting loci under selection [49], allowed us to identify seven loci fixed within Pop3 and probably influencing the phenotypic traits described above.This opens an interesting scenario where beneficial identified haplotypes might be the basis for developing milk thistle lines with higher levels of oleic, arachidonic, and behenic acids, and lower levels of linoleic acid, paving new avenues for enhancing the nutritional and agronomic characteristics of milk thistle.For instance, Pearson et al. [62] used the Bayescan method to identify SNPs associated with changes in foliar water-soluble carbohydrate levels in 935 Trifolium repens L. individuals.Among the 33 SNPs detected, one was found within the intron of ERD6-like 4, a gene encoding a sugar transporter on the vacuole membrane, prompting further investigation into these genomic regions.Additionally, a recent study on Helichrysum italicum (Roth) G. Don led to the identification of four AFLPs strongly associated with the bioclimatic variables [63], offering Asteraceae breeders an opportunity to enhance various traits through marker-assisted selection.Indeed, incorporating genetic material from individuals carrying the selected loci into milk thistle breeding populations can potentially enhance desired traits, especially using Italian accessions (Pop3) as donors.However, given the population size, it will be important to strengthen our results with molecular validations and with the de novo sequencing of a higher number of accessions, thus providing a deeper understanding of the genetic basis of important traits, and enhancing the success of breeding programs for milk thistle.

Conclusions
Understanding the genetic diversity of minor species such as S. marianum, still partially domesticated and little studied, is a fundamental step in exploiting their genetic resources.It also plays a significant role in designing efficient plant breeding programs and determining which genotypes to cross for developing new populations.The present study indicates that there is potential to enhance milk thistle for desirable traits through genetic variation.DArTseq has proven to be a robust and proficient tool to produce large numbers of informative markers that reveal a population structure and genetic differentiation in our germplasm collection.A total of twenty-two markers were identified as putatively under natural selection.Among these, seven SNP markers probably exerted significant effects on various phenotypic traits.These marker SNPs, if appropriately validated, represent a good tool for starting a milk thistle breeding aimed at expanding the use of the plant for food and non-food uses.

Fig 1 .
Fig 1. Leaf, stem and flower biodiversity of the ex-situ Silybum marianum used in this study.A) G20 stem.B) G8 striated stem.C) G9 white inflorescence D).G5 non variegated leaf, E) G20 purple inflorescence.https://doi.org/10.1371/journal.pone.0308368.g001 (Fig 3C)  and Admixture[46] (Fig3D) analyses also supported the population structure as described by the PCA plot and kinship, confirming that the germplasm collection in this study could be divided into three clusters (K = 3), probably reflecting their geographical origin.Indeed, Pop1 contained accessions from Central Europe and UK, Pop3 was mainly constituted by Southern Europe-derived accessions, mainly from Italy and Pop2 included accessions from different regions, such as Canada, Poland, and Belgium.Genetic differentiation among the three identified groups (Pop1, Pop2, and Pop3) was investigated by computing pairwise fixation index (F ST ) values.Our findings showed that the genetic differentiation was low between Pop1 and Pop2 (F ST = 0.36) and Pop1 and Pop3 (F ST = 0.37), and higher between Pop2 and Pop3 (F ST = 0.45) (Fig3), while Nei's gene diversity (H) and Shannon Index (I) were 0.17 and 0.27, respectively (S3 Table).A higher percentage of private alleles and expected heterozygosity was also detected in Pop3 (0.16% of private alleles and 0.25 of expected heterozygosity) compared with other subpopulations.Specifically, Pop3

Fig 2 .
Fig 2. SNP density plot showing the number of variants within 1 Mb window size along the S. marianum genome.The horizontal axis shows the chromosome length (Mb); the different colour depicts SNP density.https://doi.org/10.1371/journal.pone.0308368.g002

Fig 3 .
Fig 3. Population structure of Silybum accessions using DArTseq technology.A) Principal Component Analysis (PCA) using high-quality SNP markers.Samples are colored based on their grouping.B) Heat map of kinship matrix created using GAPIT3 [42].The color histogram indicates the distribution of coefficients of co-ancestry, with the stronger red color showing individuals more related to each other.C) Dendrogram obtained through nonparametric hierarchical clustering.D) bar-plot describing the population Admixture by the Bayesian approach.Each individual is represented by a thin horizontal line, which is partitioned into K-colored segments whose length is proportional to the estimated membership coefficient (q).The population was divided into three (K = 3) groups according to the most informative K value.The colors indicate the accession membership to the groups identified with the Bayesian analysis.https://doi.org/10.1371/journal.pone.0308368.g003 Fig 4).Pop3, constituted by accessions from Southern Europe and mainly from Italy, was characterized by a higher percentage content of oleic acid (p-value < 0.05), arachidonic acid (pvalue < 0.05), behenic acid (p-value < 0.05), stearic acid (p-value = 0.068) and lignoceric acid (p-value < 0.05); and by lower levels of linoleic acid (p-value < 0.05) than the other two populations (Fig 3A).Interestingly, Pop2, characterized by Canadian, Polish, and Belgian accessions, showed a higher total fatty acid content (p-value < 0.05) and palmitic acid (pvalue < 0.05) than the other two populations.Moreover, no significant difference within the three populations was observed for the percentage content of gadoleic acid (p-value = 0.282) (Fig 4A).

Fig 4 .
Fig 4. Phenotypic distribution of oil constituents (A), silymarin components (B) and fruit morphological parameters (C), among the three identified groups (Pop1, Pop2, and Pop3) of S. marianum collection.Boxplots represent the distribution of each trait, with the central line indicating the median, the box edges representing the first and third quartiles, and the whiskers extending to 1.5 times the interquartile range.Outliers are represented by individual black points.pvalues are displayed below each boxplot.https://doi.org/10.1371/journal.pone.0308368.g004 05), compared to the other population (Fig 3C).Pop3 is characterized by a lower value of thousand seed weight (TSW) (p-value < 0.05), seed area (p-value < 0.05), seed width (p-value < 0.05), and seed length (p-value < 0.05).Interestingly, Pop2 is significantly different from Pop1 and Pop3 for the seed color (p-value < 0.05) (Fig 3C).

Fig 6 .
Fig 6.Boxplot of DArTs putatively under natural selection with significant effects (p-value < 0.0001) on phenotypic traits of oil constituents in orange, fruit morphological parameters in blue, and silymarin components in green.For each selected DArT, the germplasm lines were divided into two groups according to their genotypic state (homozygous for reference or alternate allele).The X-axis represents the two alleles for each DArT, while the Y-axis corresponds to the mean of the selected phenotypic trait.Boxplots represent the distribution of each trait, with the central line indicating the median, the box edges representing the first and third quartiles, and the whiskers extending to 1.5 times the interquartile range.Outliers are represented by individual points.https://doi.org/10.1371/journal.pone.0308368.g006

Table 1 . List of S. marianum accessions used in the present study.
Accession number, DArT sample code, Accession origin: ISO code of the country where the accession was originally collected; Species; Accession description; Donor code: FAO code of donor institutions; and donor accession number were shown.